home Today's News Magazine Archives Vendor Guide 2001 Search isdmag.com

Editorial
Today's News
News Archives
On-line Articles
Current Issue
Magazine Archives
Subscribe to ISD


Directories:
Vendor Guide 2001
Advertiser Index
Event Calendar


Resources:
Resources and Seminars
Special Sections


Information:
2001 Media Kit
About isdmag.com
Search isdmag.com
Contact Us





Embedded DRAM Has a Home in the Network Processing World

By Gord Harling
Integrated System Design
Posted 08/03/01, 10:14:09 AM EDT

Download a PDF of this article: Part 1, 2, 3

Embedded DRAM is a highly desirable choice for many types of memory in network processors. There are two rationales for applying it to network processors: First, it can allow developers to extend their memory on-chip to provide new levels of performance; and second, it can permit some memory to come on board the chip, thereby reducing parts count and power consumption and greatly increasing bandwidth through the use of wide buses of 512 bits or more.

In this article we show why embedded DRAM can be used to reduce parts count and power consumption. We also discuss where embedded memory fits in the memory requirements of network processors.

Data communications rates in the Internet backbone are doubling every six to nine months, far faster than Moore 's Law. At the same time the required level of service and, therefore, the amount of data processing required, is also increasing. That combination of factors puts intense pressure on the equipment used to manage networking systems, and conventional techniques cannot cope for long.

Network processors were created to provide greater processing power for network functions when conventional processors could not satisfy the requirements. The network processor market is hotly contested and the clear winners in this arena will offer superior performance and manufacturability. Those goals can only be achieved by implementing higher bandwidth andlarger memories on-chip.

Embedded DRAM can be manufactured either in a specialty merged DRAM/logic process or as a planar variety in a pure CMOS process. In the merged DRAM/logic process the silicon cost is higher but densities are improved by a factor of eight to 10 over SRAM, which is more suitable for high-volume devices with significant amounts of memory. Planar DRAM in a pure logic process bears no cost or complexity implications and can improve density by a factor of two over SRAM.Historically,DRAM has been limited to sub-100-MHz random-access speeds, but recently 166-MHz random-access DRAM has been developed.

There are some architectural and behavioral differences with DRAM vs.SRAM that affect implementation. The differences relate to the page mode in DRAM and the requirement for DRAM refresh.

DRAMs store states as an analog voltage that must be “sensed ” prior to read. The sensing operation is performed in parallel on a full “page" of bits at once. Each page can contain many words and, once sensed,they can be accessed at high speed through multiplexers in what is called page mode. Depending on the memory design -- a page may contain 64 kbits or more -- pages in separate blocks or banks can also be accessed through the fast multiplexers (see Fig.1). Separate banks can also be used to perform multiple simultaneous operations such as write/reads and refresh operations, or banks can be shut down to save power when traffic is low.

DRAM requires periodic refresh to restore the charge. A sensing operation restores the charge automatically, but if the memory is not read frequently enough, a dummy read operation must be performed.

Apart from those two differences, SRAM and DRAM are converging in most other aspects of implementation. SRAMs in the megabit range now require redundancy and built-in self-test (BIST) and are increasingly affected by soft errors.

Consider the requirements of a next-generation network processor (see Fig.2). It requires four basic types of memory: packet memory, header memory, routing-table lookup memory and program memory.

Packet memory is a data buffer that contains the payload information to be transmitted. This data streams in and out of the processor, and so it is very suitable for DRAM with a fast page mode for sequential accesses. It is usually implemented with off-chip SDRAM or RDRAM, since packet memory tends to be large, on the order of 32 to 256 Mbytes.

Packet memory bandwidth is very important since it connects directly to the data stream. Packet memory also requires at least two ports so that it can be connected to the data stream and yet still accessed by the network processor when required.

There are trade-offs to be made with the packet buffer size to optimize the system efficiency. The minimum packet size should be about equal to the buffer or segment size. If the packet buffer size is too large, the memory efficiency is poor for minimum-size packets; if it is too small, then multiple buffers have to be used to hold most packets. That issue affects not only the memory efficiency but also the bandwidth requirement, since cycles used to write to unused memory locations are also lost. Another trade-off involves the size of the packet buffer and the speed of decision making. If forwarding decisions can be made quickly, then a smaller packet buffer can be used and it can be more easily pulled on-chip to save power and area.

Header memory is used to store copies of the header information associated with the packets. It is written in bursts and read randomly in small packets. The processors read the header memory to obtain information about the packet and determine the appropriate method of processing the data. It is usually 256 kbytes (2 Mbits) and has low permanency, since it is associated with the packets. Lookup memory contains the routing table and is sometimes implemented as content-addressable memory (CAM). It is typically quite large and is often implemented off-chip. The size of this table can have a direct relationship to performance, up to a limit. The data in this memory is relatively stable and it is written through a low-speed update port, but it must be read very rapidly and typically requires random access.

Program memory is used by the network processors to store their program code. It has high permanency and is often specified by users as having five-year data retention while powered. The program memory is typically 4k to 8k words but the processor word may be 32 to 64 bits, resulting in a range of 128 to 512 kbits. The next generation of network processors is looking at much larger program store,perhaps eight to 16 times that size.

Application-specific DRAM
The program and lookup memory have high permanency, and many manufacturers want the contents to be guaranteed for a worst-case life of five years. Recent data suggests that SRAM demonstrates soft-error failure rates on the order of 1,000 FITs/megabit. Since DRAM already carries a refresh cycle overhead,the refresh period can be used to actively detect and correct errors in the memory without performance implications, to provide unparalleled soft-error protection. A simple Hamming code correction can provide a 2-bit error detect and 1-bit error correct,and the probability of multiple errors between refreshes is extremely small.

The header and packet memory have very low permanency. In many applications it may be possible to eliminate the refresh requirement. The residence time of the data is so short that the probability of soft errors on any particular packet is extremely low.

For packet memory, which generally streams data from inputs and outputs,we propose highspeed DRAM with a variable page mode. The variable page mode could be used to convert the page multiplexer from x4 to x8 to x16 operation on the fly, so that a wide range of packet sizes could be accommodated without memory waste.

Embedded DRAM can be implemented with very wide data buses so that memory accesses are performed on hundreds or thousands of bits at a time.

Header memory requirements are racing ahead of embedded technology, and economics may dictate that the header memory remain off-chip. Lookup memory can be built using DRAM CAM for significant gains in area and power. This technology has been pioneered by Mosaid for standalone 2-Mbit CAMs, but next-generation devices will contain at least 18 Mbits and will be difficult to integrate economically.

Program memory requires dual-port memory with one high-speed random-access port connected to the processor and a relatively low-speed port for program updates. The memory would include refresh, which would also perform ECC invisibly and would far exceed the five- year data retention specification that is required by most manufacturers.


 

Sponsor Links

All material on this site Copyright © 2001 CMP Media Inc. All rights reserved.